Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition
نویسندگان
چکیده
We consider optimizing a function smooth convex function f that is the average of a set of differentiable functions fi, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient f ′ i is bounded by a linear function of the norm of the average gradient f . We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an O(1/k) convergence rate, and has a linear convergence rate if g is strongly-convex. 1 Deterministic vs. Stochastic Gradient Descent We consider optimizing a function f that is the average of a set of differentiable functions fi, min x∈RP f(x) := 1 N N
منابع مشابه
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-\L{}ojasiewicz Condition
In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older PolyakLojasiewicz (PL) inequality is actually weaker than the main condition...
متن کاملLinear Convergence of Proximal-Gradient Methods under the Polyak-Łojasiewicz Condition
In 1963, Polyak proposed a simple condition that is sufficient to show that gradient descent has a global linear convergence rate. This condition is a special case of the Łojasiewicz inequality proposed in the same year, and it does not require strong-convexity (or even convexity). In this work, we show that this much-older Polyak-Łojasiewicz (PL) inequality is actually weaker than the four mai...
متن کاملConvergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
We extend the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions. For the deterministic projected subgradient method, we present a global O(1/ √ T ) convergence rate for any convex function which is locally Lipschitz around its minimizers. This approach is based on Shor’s classic subgradient analysis and implies generalizations of the standard convergenc...
متن کاملProjected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme under Weak Strong Convexity Assumption
We propose a projected semi-stochastic gradient descent method with mini-batch for improving both the theoretical complexity and practical performance of the general stochastic gradient descent method (SGD). We are able to prove linear convergence under weak strong convexity assumption. This requires no strong convexity assumption for minimizing the sum of smooth convex functions subject to a c...
متن کاملEscaping Saddles with Stochastic Gradients
We analyze the variance of stochastic gradients along negative curvature directions in certain nonconvex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that contrary to the case of isotropic noise this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensiona...
متن کامل